Cancer Prognosis Prediction Using Balanced Stratified Sampling
نویسندگان
چکیده
High accuracy in cancer prediction is important to improve the quality of the treatment and to improve the rate of survivability of patients. As the data volume is increasing rapidly in the healthcare research, the analytical challenge exists in double. The use of effective sampling technique in classification algorithms always yields good prediction accuracy. The SEER public use cancer database provides various prominent class labels for prognosis prediction. The main objective of this paper is to find the effect of sampling techniques in classifying the prognosis variable and propose an ideal sampling method based on the outcome of the experimentation. In the first phase of this work the traditional random sampling and stratified sampling techniques have been used. At the next level the balanced stratified sampling with variations as per the choice of the prognosis class labels have been tested. Much of the initial time has been focused on performing the pre-processing of the SEER data set. The classification model for experimentation has been built using the breast cancer, respiratory cancer and mixed cancer data sets with three traditional classifiers namely Decision Tree, Naïve Bayes and K-Nearest Neighbour. The three prognosis factors survival, stage and metastasis have been used as class labels for experimental comparisons. The results shows a steady increase in the prediction accuracy of balanced stratified model as the sample size increases, but the traditional approach fluctuates before the optimum results.
منابع مشابه
Cost-Sensitive Learning for Recurrence Prediction of Breast Cancer
Breast cancer is one of the top cancer-death causes and specifically accounts for 10.4% of all cancer incidences among women. The prediction of breast cancer recurrence has been a challenging research problem for many researchers. Data mining techniques have recently received considerable attention, especially when used for the construction of prognosis models from survival data. However, exist...
متن کاملFast balanced sampling for highly stratified population
Balanced sampling is a very efficient sampling design when the variable of interest is correlated to the auxiliary variables on which the sample is balanced. A procedure to select balanced samples in a stratified population has previously been proposed. Unfortunately, this procedure becomes very slow as the number of strata increases and it even fails to select samples for some large numbers of...
متن کاملA stratified sampling technique based on correlation feature selection method for heart disease risk prediction system
In medical, data mining method can be utilized by the physicians to improve clinical diagnosis. In this paper a stratified approach named Correlation Feature Selection Stratified Sampling (CFS-SS) has been introduced. This method is applied to medical diagnosis heart disease risk prediction system. By using this proposed system the attributes are grouped together into homogenous sub groups, bef...
متن کاملVariance Estimation from Complex Surveys Using Balanced Repeated Replication
For estimating the variance of nonlinear statistics like regression and correlation coefficients in stratified sampling designs, the Balanced Repeated Replication (BRR) method has received special attention, although other procedures like linearization (Taylor’s series expansion method), Jackknife repeated replications and the Bootstrap method are also available in the literature. BRR method in...
متن کاملProtein expression profiling and molecular classification of gastric cancer by the tissue array method.
PURPOSE Gastric cancer is heterogeneous clinically and histologically, and prognosis prediction by tumor grade or type is difficult. Although previous studies have suggested that frozen tissue-based molecular classifications effectively predict prognosis, prognostic classification on formalin-fixed tissue is needed, especially in early gastric cancer. EXPERIMENTAL DESIGN We immunostained 659 ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- CoRR
دوره abs/1403.2950 شماره
صفحات -
تاریخ انتشار 2014